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Abstract 

This paper addresses the problem of estimating the tail index a of distributions with heavy, 
^, ■ Pareto-type tails for dependent data, that is of interest in the areas of finance, insurance, en- 

vironmental monitoring and teletraffic analysis. A novel approach based on the max self- 
similarity scaling behavior of block maxima is introduced. The method exploits the increasing 
q^ ■ lack of dependence of maxima over large size blocks, which proves useful for time series data. 

CN . We establish the consistency and asymptotic normality of the proposed max-spectrum es- 

timator for a large class of m— dependent time series, in the regime of intermediate block- 
maxima. In the regime of large block-maxima, we demonstrate the distributional consistency 
q ■ of the estimator for a broad range of time series models including linear processes. The max- 

spectrum estimator is a robust and computationally efficient tool, which provides a novel time- 
scale perspective to the estimation of the tail-exponents. Its performance is illustrated over 
synthetic and real data sets. 
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1 Introduction 

The problem of estimating the exponent in heavy tailed data has a long history in statistics, due 
to its practical importance and the technical challenges it poses. Heavy tailed distributions are 
characterized by the slow, hyperbolic decay of their tail. Formally, a real valued random variable 
X with cumulative distribution function (c.d.f.) F(x) = ¥{X < x}, xelis (right) heavy-tailed 
with index a > 0, if 

¥{X > x} = 1 - F(x) ~ L{x)x~ a , asx^oo, (1.1) 
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where ~ means that the ratio of the left-hand side to the right-hand side in ( ll.lt tends to 1, as 
x — > oo. Here L(-) is a slowly varying function at infinity, i.e. L(Xx)/L(x) — > 1, as x — > oo, for 
all A > 0. For simplicity purposes, we suppose that X is almost surely positive i.e. F(0) = 0, and 
we also focus on the case when L(-) is asymptotically constant, namely 



L(V) ~ a, 



o ■ 



as x — > oo, 



(1.2) 



for some a > 0. The case when the slowly varying function L(-) is non-trivial is discussed in the 
Remarks after Theorem 13. 1[ below. 

The tail index (exponent) a controls the rate of decay of the tail of F. The presence of 
heavy tails in data was originally noted in the work of Zipf on word frequencies in languages 



(Zipfj (ll932l) ). who also introduced a graph ical device for their detection (Ide Sous a and Michailidis 



(120041) ). Subsequently. [Mandelbrot! (119601) noted their presence in financial data. Since the early 
1970s heavy tailed behavior has been noted in many other sci entific fields, such as hydrology , 
ins urance cla i ms an d social and biological networks (see, e.g. iFinkenstadt and Rootzenl (120041) 
and lBarabasil (120021) ) . In particular, the emergence of the Internet and the World Wide Web gave 
a new impetus to the study of heavy tailed distributions, due to their omnipresence in Internet 
packet and flow da t a, the topologic a l structure of the Web, t he size of computer files , etc. (see e.g . 
Adler et all il998h iResnickl d 1997b . iFaioutsos et all d 19991) . lAdamic and Hubermanl d200ol. l2002f) . 
Park and Willingerl (|2000|) ). I n fact, heavy tailed behavi or is a characteristic of highly optimized 
physical systems, as argued in lCarlson and Doy id (| 19991) . 

Heavy tails are also ubiquitous in stock market data. It is well-documented that the re- 
turns of many stocks measured at high-frequency exhibit non-negligible extreme fluctuations, 
consistent with a non-Gaussian, heavy-tailed model. The availability of high-frequency tic- 
by-tic data reveals further pronounced presence of heavy tails in the transaction volumes. Fig- 
ure Q] shows the volumes associated with all single transactions of the Honeywell Inc. stocks 
recorded during January 4th, 20 05 at the New York Stock Exchange (NYSE) and NASDAQ (see, 
Wharton Research Data Service! (|url|) ). The transactions are ordered by their occurrence in time. 
The presence of large spikes indicates heavy tails, similar, for example, to the moving average with 
Pareto innovations shown in Figure [2] below. 

Some important features of such data are: (i) their large size due to the fine time scale res- 
olution (high-frequency) at which they are collected (ii) their temporal structure that introduces 
dependence amongst observations, and (iii) their sequential nature, since observations are added 
to the data set over time. Traditional methods for estimating the tail index are not well suited for 
addressing these issues, as discussed below. 

The majority of the approaches proposed in the literature focuses on the scaling behavior of the 
largest order statistics Xm > X/ 2 ) > • • > X< n \ obtained from an in dependent and identi cally 
distrib uted (i.i.d.) sample X(l), . . . , X(n) from F\ typical e x amples include Hill's estim ator iHill 
(11975b and its nu merous variations (IKratz and Resnickl (11996b. iResnick and Starical (11997b ). kernel 
based estimators (ICsorgo et al.1 (11985b andlFeuerverger and Halll d 1999b). A review of the s e met h- 
ods and their applications is given in Ide Haan et al.l (I2000b and Ide Sousa and Michailidisl (120041) ). 
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Transaction volumes for HON: Jan 4, 2005 
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Figure 1 : Transaction volumes for Honeywell Inc. (HON) from the NYSE and NASDAQ consolidated 
trades and quotes data base during January 4th, 2005. The observations correspond to the volumes (in a 
number shares) per single transaction. The transactions are listed in the order of their occurrence in time. 
The observed heavy-tailed behavior of traded volumes is ubiquitous across different trading days and across 
the entire spectrum of relatively liquid stocks. 



The most 



widely used in practice is the Hill estimator an(k) defined as: 



a H {k) :-- 



^2 ln X (i) - ln X (k+1) 



(1.3) 



with k, 1 < k < n — 1 being the number of included order statistics. The parameter k is typically 
selected by examining the plot of the au(k)'s versus k, known as the Hill plot. I n practice, one 



choose s a value of k where the Hill plot exhibits a fairly constant behavior (see e.g. Ide Haan et al 
(120001) ) . However, the use of order statistics requires sorting the data that is computationally 



expensive (requires at least O(nlog(ri)) steps) and destroys the time ordering of the data and 
hence their temporal structure. Further, as can be seen from the brief review above, most of the 
emphasis has been placed on point estimation of t he tail index and little on con structing confidence 
intervals. Exceptions can be found in the work of ICheng and Pend (1200 if) andlLu and Pengl (120021) 
for the construction of confidence intervals and of iResnick and Starical (|1995[) on the estimation of 
a for dependent data. 

The purpose of this study is to introduce a method for estimating the tail index that overcomes 
the above listed shortcomings of other techniques. It is based on the asymptotic max self-similarity 
properties of heavy-tailed maxima. Specifically, the maximum values of data calculated over 
blocks of size m, scale at a rate of m l / a . Therefore, by examining a sequence of growing, dyadic 
block sizes m = 2 jf , 1 < j < log 2 n, j G N, and subsequently estimating the mean of logarithms 
of block-maxima (log-block-maxima) one obtains an estimate of the tail index a. Notice that by 
using blocks of data, the temporal structure of the data is preserved. This procedure requires 0(n) 
operations, making it particularly useful for large data sets; further, the estimates for a can be 
updated recursively as new data become available, by using only C(log 2 n) memory and without 



the knowledge of the entire data set, thus making the proposed estimator particularly suitable for 
streamin g data. Estimators based on max-self similarity for the tail index for i.i.d. data were intro- 
duced in lStoev et al.1 (|2006h . where their consistency and asymptotic normality was established. In 
this paper, we extend them to dependent data, prove their consistency, examine and illustrate their 
performance using synthetic and real data sets and discuss a number of implementation issues. 

The remainder of the paper is structured as follows: in Section [2] the max-spectrum estima- 
tors are introduced. Their consistency and asymptotic normality is established in Section [3TT1 for 
m— dependent processes. The distributional consistency of the estimators is established in Section 
13.21 for a large class of time series models (including linear processes) under a mild asymptotic 
independence condition. The construction of confidence intervals is further addressed in Section 
13.31 The important problem of automatic selection of parameters is addressed in Section HI Appli- 
cations to financial time series are discussed in Section[5l while most technical proofs are given in 
the Appendix. 

2 Max self-similarity and tail exponent estimators 

Here we introduce the max self-similarity estimators for the tail exponent and demonstrate several 
of their characteristics. We start by reviewing the basic ideas fo r the case of indep endent and 



identically distributed (i.i.d.) data. A detailed exposition is given in lStoev et al.1 (|2006|) . 
Consider the sequence of block-maxima 

in 
X m (k) := max X(m(k - 1) + i) = V X(m(k -l)+i), k = 1, 2, . . . , m e N, 

Kt<ro " 

i=l 

where X m (k) denotes the largest observation in the k— th block. By (11.11) & (11.21) and the Fisher- 
Tippett-Gnedenko Theorem, 

-L-X m (k)\ A{z(fc)) , asm^oo, (2.1) 

where — > denotes convergence of the finite-dimensional distributions, with the Z(k)'s being inde- 
pendent copies of an a— Frechet random variable. A random variable Z is said to be a— Frechet, 
a > 0, with scale coefficient a > 0, if 

nZ < x} = { ^P{-^' a } '*>0 (2 .2) 

[0 , x < 

The Frechet variable Z is said to be standard if a = 1. 

Thus, for large m's the block-maxima X m (/c)'s behave like a sequence of i.i.d. a— Frechet 
variables, which suggests the following: 



Definition 2.1 A sequence of random variables X = {X(k)} keN is said to be max self-similar 
with self-similarity parameter H > 0, if for any m > 0, 



m 



{ \/ X(m(k - 1) + t)} km = {m H X(k)} km , (2.3) 






with = denoting equality of the finite-dimensional distributions. 

Relationship (12.31) holds asymptotically for i.i.d. data and exactly for Frechet distributed data. 
Hence, any sequence of i.i.d. heavy-tailed variables can be regarded as asymptotically max self- 
similar with self-similarity parameter H = I /a. This feature suggests that an estimator of H and 
consequently a can be obtained by focusing on the scaling of the m aximum values in blocks of 



growing size. A similar idea applied to block-wise sums was used in ICrovella and Taqqul (I1999f) 
for estimating a, in the case < a < 2. 

For an i.i.d. sample X(l), . . . , X{n) from F, define 



D(j, k) := max X(2 j (k -l)+i) = \J X(2 j (k - 1) + i), k = 1, 2, . . . , n,-, (2.4) 



i=X 



for all j = 1,2,..., [log 2 n], where rij := [n/2 : >] and where [x] denotes the largest integer not 
greater than x E R. By analogy to the discrete wavelet transform, we refer to the parameter j 
as the scale and to k as the location parameter. We consider dyadic block-sizes because of their 
algorithmic and computational advantages. Introduce the statistics 

Y i := ^.J2 lo S2 D(j, k), j = 1,2,..., [log 2 n\. (2.5) 

J fe=i 

The Law of Large Numbers implies that for fixed j, as rij — > oo, the Y/ s are consistent an d 



unbiased estimators of ULYj = Elog 2 D(j, 1), if finite (see Corollary 3.1 in IStoev et al.1 (|2006r) ). 
On the other hand, the asymptotic max self-similarity (12.11) of X and (12.41) suggest that under 
additional tail regularity conditions (see e.g. Proposition ^. ll below): 

EYj = E log 2 D(j, l)~jH + C = j /a + C, as j -)• oo, (2.6) 

where C = C(ctq, a) = Elog 2 o§Z, and where ~ means that the difference between the left- and 
the right-hand side tends to zero, with Z being an a— Frechet variable with unit scale coefficient. 
Then, a regression-based estimator of H = I /a (and hence a) for a range of scales 1 < ji < 
j < h < [log 2 n] is given by: 

h 
H(ji,j 2 ) := ^WjYj, and a(j 1 ,j 2 ):=l/H(j 1 ,j 2 ), (2.7) 



where the weights w/s are chosen so that J2 



J 2 






and V^ 2 . jw* 



1. The optimal weights 



w/s can be calculated through generalized least squares (GLS) regression using the asymptotic 
covariance matrix of the Y/s. In practice, it is important to at least use we ighted least squar es 
(WLS) regression to account for the difference in the variances of the Yj's (see. lStoev et al.1 (I2006I) ). 
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Figure 2: Top panel: auto-regressive time series of order 1 with Pareto innovations of tail exponent a = 1.5. 
Bottom left and right panels: the Hill plot for this data set and its zoomed-in version, respectively. The 
dashed horizontal line indicates the value of a = 1.5. 



We propose to use the estimator defined in (12.71) for dependent time series data. We first il- 
lustrate its usage through a simulated data example. A data set of size n = 2 15 = 32, 768 was 
generated from an auto-regressive time series of order one with Pareto innovations. Specifically, 



X(k) = 0X0 - 1) + Z(k) = Y^ ^ Z ( k ~ '). k = !> 



n. 



i=0 



where <p = 0.9 and ¥{Z{k) > x} = x~ a , x > 1, with a = 1.5. The data together with its Hill 
plot are shown in Figure [21 Notice that even though the Hill estimator work best for Pareto data, 
the dependence structure in the model leads to a Hill plot, which is substantially different from that 
for independent Pareto data (see the bottom left panel). The zoomed-in version of the Hill plot 
(bottom right panel) however indicates that the tail exponent should be in the range between 1 and 
2. The choices of k in the range between 200 and 400 do in fact lead to estim ates around 1.5. This 
range however is hard to guess if one did not know the true value of a = 1.5. iResnick and Starica 
(11997b have shown that the Hill estimator is consistent for such dependent data sets. Nevertheless, 
as this example indicates, the Hill plot can be difficult to assess in practice. 



Max-spectrum: a(10, 15) =1.4774 



>.-10 




Figure 3: The max-spectrum plot for the data set in Figure |2l The max self-similarity estimator of the tail 
exponent, obtained from the range of scales (ji, j'2) = (10, 15), is 3(10, 15) = 1.4774. 



In Figure[3l the max-spectrum plot is shown; i.e. the plot of the statistics Yj versus the available 
dyadic scales j, 1 < j < [log 2 n](= 15). The estimated tail exponent over the range of scales 
(10, 15) is 1.4774, which is very close to the nominal value of a = 1.5. Moreover, the max- 
spectrum is easy to assess and interpret. One sees a "knee" in the plot near scale j = 10, where 
the max-spectrum curves upwards and thus it is natural to choose the range of scales (10, 15) to 
estimate a. The choice of the scales (j\, j 2 ) can be also automated, as briefly discussed in Section 
H below. 

Remark: (on the algorithmic implementation) The max-spectrum Yj, j = 1, . . . , [log 2 n] of a 
data set X\, . . . , X n can be computed efficiently in 0(n) steps, without sorting the data. Indeed, 
this is evident from the recursive construction of block maxima, since 



D(j + l,k) = msx{D(j,2k-l),D(j,2k)}, k = 1, . . . , [n/2>' +1 ], 1 < j < [log, 



n 



Moreover, this property can be further used to obtain a sequential algorithm for the computation of 
the Y/s. Indeed, keep in addition to the Y/s, the last block-maximum Dj := D(j,rij), rij = 
[n/2 J ] per scale j, and also the extra variables Rj = max 2Jn <j< n X(z), which represent the 
maxima of the 'left-over' X^s over the range 2 J [n/2 J ] < i < n. Now, if a new observation 
X n+ i is recorded, one can easily update the Rj's and the Y/s, with the help of the Rfs, and the 
Dji := D(j',rij>ys, for 1 < f < j. Thus, one recovers the (Yj, Dj, Rj)j— representation of the 
data Xi, . . . , X n+ i. Since only log 2 n scales are available, we perform 0(log 2 n) operations per 
update and use Oi\og 2 n) memory to store the max-spectrum and the auxiliary data. 

This sequential implementation of the max-spectrum is of critical importance in the context of 
data streams in modern data bases or Internet traffic applications. In such settings, large volumes 
of data are observed in short amounts of time; they cannot be stored and/or sorted efficiently while 
at the same time rapid 'queries' need to be answered about various statistics of the data. The 



proposed max-spectrum estimator provides a unique tool for the estimation of the tail-exponent 
of such data. Notice that the other available techniques require sorting the data which is impossible 
without having to store the entire data set. A sequential implementation of the Hill estimator for 
example would require 0(n) memory, which is prohibitive in many applications. 

3 Asymptotic properties 

3.1 Asymptotic Normality (in the intermediate scales regime) 

The estimators H and a = \/H in (|2.7I) utilize the scaling properties of the max-spectrum statis- 
tics Yj in (12.5b . The discussion in Section [2] suggests that the max self-similarity estimators in 
(12.71 ) will be consistent as both the scale j and r u tend to infinity. The consistency and asymptotic 



normality of these estimators was established in lStoev et al.l (|2006l) for i.i.d. data. This was accom 



plished by assessing the rate of convergence of moment type functionals of block-maxima, such 
as E log 2 D(j, 1), under mild conditions on the rate of the tail decay in (11.11) . Here, we focus on the 
case of dependent data and establish the asymptotic normality of the proposed max self-similarity 
estimators under analogous conditions on the rate. 

Consider a strictly stationary process (time series) X = {X(k)}kez with heavy-tailed marginal 
c.d.f. F as in (11.11) & (11.21) . Further, assume that the X(i)'s are positive, almost surely, that is, 
F(0) = 0. In many contexts, the block-maxima of X scale at a rate m 1 /" as the block size m 
grows even under the presence of strong depe ndence. This is so , for ex ample, when the time series 



X has a positive extremal index (see, p. 53 in iLeadbetter et al.l (119831) ). The following conditions 
make this more precise by quantifying further the rate of convergence. 
Let M n : = maxi<fc< n X(k) and let 

F n (x) =F{M n /n 1/a <x} =: exp{-c(n,x)x~ a }, x > 0, n £ N. 

One can see that M n /n l l a — > Z, n — V oo if and only if c(n, x) — V Cx = const , n — > oo, for 
all x > 0, where Z is an a— Frechet variable with scale a = c^. The following conditions will 
help us quantify the rate of the last convergence and also obtain rates of convergence for moment 
functionals of block-maxima in Proposition l3.1l below. 

Condition 3.1 There exists /3 > and R £ R, such that 

\c(n, x) — c x \ < Ci(x)n _/3 , forallx>0, and cx(x) = 0(x~ R ), x I 0, (3.1) 

for some Cx > 0. 

Condition 3.2 For all x > 0, we have 

c(n,x) > c 2 min{l,x 7 }, for some 7 £ (0, a), (3.2) 

for all sufficiently large n £ N, where c 2 > does not depend on n. 

8 



Remarks 



1. Conditions 13.11 and 13 .21 are not very restrictive. They can be shown to hold, for example, for 
a large class of moving maxima processes (see, Proposition [6]2]). 



l/a, 



2. Condition 13.11 implies in particular that M n jn}l a A Cy^Z, as n — > oo, for a standard 

a— Frechet variable Z. In view of (11.11) . we also have that M^/n l ^ a — > a^Z, as n — > oo, 
with M* := maxi<fc< n X(/c)*, where the X(k)* , s are i.i.d. random variables with c.d.f. F. 
This implies that the ex tremal index 9 of the time series X is: 9 := cx/<Jq (see, e.g. p. 53 in 



Leadbetteret all (119831) ) 



Conditions 13 . 1 1 & [3T2l yield the foll owing important re sult on the rate of convergence of log-block 
maxima, similar to Corollary 3.1 in lStoev et al.1 (|2006l) . 



Proposition 3.1 Let X = {X(k)}k^z be a strictly stationary time series which satisfies Con- 



ditions 13.71 & 13.21 Suppose that j^° c\ (x] 



x 



-a-l+5 



dx < oo, for some 5 > 0. Then, with 



M n := maxi<K„ X^, we have E| ln(M n ) | p < oo, for all p > and all sufficiently large n e N. 
Moreover, for any p > and k G N, we have: 



E| ln(M n /n 



l/a\ip 



E\\n(Z)\ p =0(n' p ), and E(ln(M„/n 1/a )) fe - E(\n(Z)) k = 0(n 



as n — > oo, where Z is an a— Frechet random variable with scale coefficient a 



l/a 
Z X ■ 



The proof is given in Section[6l Proposition 13 . 1 I readily implies: 

E(Y j -j /a) =Elog 2 (D(j, k)/2^ a ) = E\og 2 (c][ a Z t ] 



0(l/V?), 



(3.3) 



as j — > oo, where Z\ is a standard a— Frechet variable. This result yields an asymptotic bound on 
the bias of the estimators H(ji,j 2 ) in (12.71) above. 

Proposition 13.11 can be further used to establish the asymptotic normality of a(ji,j 2 ) = 
1/H(ji,j 2 ) in (I2.7I ). To do so, we focus on a range of scales (ji, j 2 ) which grows with the sample 
size. Namely, we fix £ e N, £ > 2, let ji := 1 + j(n) & j 2 := £ + j(n), and as in (12.71 ) define: 



H n :=^2wiY i+j{n) and a n :=l/H n , 



i=l 



yl-1 



where Yli=o i Kvj i — K i k = 0,1. The next theorem is the main result of this section. It establishes 
the asymptotic normality of the estimator a n , as j{n) and n tend to infinity. 

Theorem 3.1 Let X±, . . . , X n be a sample from an m— dependent heavy-tailed process X 
{X(k)}kez- Suppose that (13.11 ) and (13.21 ) hold and let j = j(n) 6Nk such that 



2 j( - n ^/n + n/2^ n ^ 1+2min ^ 1 '^^ 



0. 



as n 



oo. 



(3.4) 



Then, as n — > oo, 



y/nj(a n — a) — >Af(0,a 2 c w ), with c w = 2_, Wi'Wi»Y,i(i' r ,i"), 



(3.5) 



i'i"=i 



where n$ = [n/2 j ]. Here £i(i',i") = 2 min ^'^Cov(log 2 Z l ,log 2 {Z 1 V (2^"! - l)Z 2 )),for 
i', %" = 1, . . . , £, where Z\ and Z 2 are independent standard 1—Frechet random variables. 



Proof: By the 'Delta-method' (see e.g. Theorem 3.1 in Ivan der Vaartl (|1998l) ). it suffices to 
show that 

^W-(H n -H)^Af(0,H 2 c w ), asrwoo. (3.6) 

Indeed, since a n = f(H n ), with f(x) = 1/x, we have a n — a = —H~ 2 (H n — H) + o p (H n — H), 
as n — > oo. 
Let now 

D(j',k):= \/ X{2>'{k-l)+i) and Y f := — ^log 2 5(/, A;), 



i=l 



fc=l 



for all jx < f < j 2 , and k = 1, . . . , nf, where rif := [n/2 J ]. Observe that since the time series 
X is m— dependent, the D(j', fc)'s are now independent in k. Hence , in vi ew of Conditions 13. II 
& 13.21 and Proposition 13.11 the results of Theorem 4.1 in IStoev et al.l (|2006r ) readily apply to the 
max-spectrum Yf, f = j l5 . . . ,j 2 , which is based on the independent D(j', fc)'s. Therefore, by 
setting H n := Ya=i WiY i+j{n) , we obtain: 



y/n~j{H n -H) — > Af(0, H 2 c w ), as n -)> oo. 



(3.7) 



In view of (13.71) . to establish (13.61) . it is enough to show that H n — H n = o p (l/ v /n~), n — > oo, or 
that, for example, 

E(H n -H n ) 2 = Var(H n -H n ) + (EH n -EH n ) 2 = o(l/n ] ), as n -)• oo. (3.8) 

Consider first the term Var(if n — H n ). Since if n — H n = J2i=i w i(Xi+j(n) — Yi+j(n)), we have 

i s-t) £ 

Vav(H n -H n ) < Cj]Var(y, +i(n) -y i+i(n) ) < — J] Var(log 2 £>(i+j(n), l)-log 2 D(i+j(n), 1)) 



i=l 



7* 



3 t=l 



for some constants C and C", where the last inequality follows from Lemma l6Tl Now, Lemmas 
6.21 and 16731 imply that Var(log 2 D(i + j(n), 1) — log 2 D(i + j(n), 1)) — >• 0, n — > oo, and hence 
Vax(H n — H n ) = o{l/rij), as n — > oo. 

Now, focus on the term (EH n — EH n ) 2 in (13 .8b . For some constant Cg > 0, we have 



(EH n - EH n ) 2 < C £ J~J(Ey i+i(n) - EY l+3{n) ) 2 = C^ ( E lo S: 



r D(i + j(n) 1 l)V 
D(i + j(n),l) 



(3.9) 
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where we used the inequality (^2i=i x i) 2 < ^Si=i x ? an ^ me stationarity (in k) of the D(i + 
j(n), fc)'s and D(i + j(n), fc)'s. We further have that 

E1 °g2 -,. —r^-r, = E1 °S2 — tttuxjt: — -Elog 2 , .,, ... - - log 2 (l - 



D(i+j(n),l) \ 2 j(n)/a ' \(2J( n ) - m y/ a J a * 2K W 

(3.10) 
Relation (13.31) implies that the last two expectation are both equal to E log 2 (Z) + 0(l/2^ n ^), as 
n — y oo, where Z is an a— Frechet variable with scale c^ ■ Therefore, from (13.91) and (13.101) . we 
obtain 

(EH n - EH n ) 2 = C(l/2 2j(n)/3 ) + 0(l/2 2j{n) ) = o(l/2 2lin)m ' m{1 ' (S} ), as n ->• oo, 

where in the last relation we used that log 2 (l — x) — O(x), x — > 0. 

By combining the above derived bounds on the terms on the right-hand side of (13.81) . we get 

H n - H n = op(l/VnJ) + o p (l/2 j(n ) min ^^) = o p (l/V^"), as n -)■ oo, 

where the last equality follows from @3) since r2 /2i(")( 1 + 2min {i,/3}) = ^^^'(^"^{i^} -> 0, as 
n — > oo. This implies (13.61) and completes the proof of the theorem. □ 



We conclude this section with several important remarks on the scope of validity of the asymp- 
totic results in Theorem 13. II 

Remarks 

1 . (On the role of/3) The parameter /3 > in Condition 13 . 1 1 controls the rate of the convergence 
in distribution of M n /n l / a to the a— Frechet limit law. The larger the value of (3, the faster 
the convergence in (13.11) . and in view of (13.41) . the wider the range of scales j (n) 's in Theorem 
13.11 that lead to asymptotically normal a n 's. In particular, the larger the (3, the faster the 
convergence of the 5? n 's can be made, since one could choose relatively small j(n)'s. 

On the other hand, when the rate of convergence of the law of M n /n l / a to its limit is rela- 
tively slow, then the values of (3 > can be close to zero. This can lead to arbitrarily slow 
rates of the convergence of a n since one may have to choose relatively large scales j(n)'s to 
compensate for the rate of the bias in the max-spectrum on smaller scales. 



2. (On the connection with Hill estimators) As argued in IStoev et al.l (|2006l) . for the case of 



independen t data, Condition ^ . 1 I corresponds precisely to the second-order condition used in 



Halll (|1982|) . where the asymptotic normality of the Hill estimator was established. The rates 



of convergence in (13.51) above are, in the case o f independe nt data, in close correspondence 



with the rates for the Hill estimator, obtained in Hall (1982). 



3. (On data with regularly varying tails) Consider the case when the X(k)'s satisfy (11.11) where 
now the slowly varying function L(-) is non-trivial. Then, the max-spectrum based estima- 
tors of a will continue to work. Indeed, for the case of i.i.d. data, we have that 

M n d 



* z, as n — > oo, 
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where Z is a standard a— Frechet variable, and where a„ = n l / a £{n) is such that 



a 



na n a L(a n ) = —-—L(n 1 ^ a £(n)) — y 1, as n — y oo. 
£(n) 



Here£(-) is another slowly varying function related to L (see e.g. Proposition 1.11 in lResnick 

JS3). 



If one replaces n 1 /" by a n = n l / a £{n) and ex by 1 in Conditions 13.11 and 13.21 then Propo- 
sitions [3TT] and [6J] will continue to hold with M n /n l ' a replaced by M n /a n . The proofs are 
essentially the the same. In the case of independent data, one has that 

-Var(log 2J D(j,l)/(2^(2^) 1 /«)) = 1 



Var(^) = -Vai(log 2 D{j,l)/(2 j/a e(2^ a )) = 1 (Var(log 2 Z) + o(l)) , (3.11) 

I v 4 \ / I tin \ / 



where the remainder term o(l) vanishes, as j — y oo, because of the analog of Relation (|3.3I) . 
Also, by the counterpart of (13.31) . one obtains: 

Efe-j/a - log 2 (^'))/a) = Elog 2 (Z) + 0(1/2^), as j -> oo. (3.12) 

Consider now a fixed m G N, m > 2 and let 



^n = 5Z "^ 



i+j(n)- 



i=l 



Relation (13.111) implies that Var(ff n ) — y 0, as rt and j(n) tend to infinity. On the other hand, 
Relation (13.121) shows that 

EH n = -Y^iw i + 0(l/2^) + (j(n)+Elog 2 (Z))Y^w i + -Y'logM2 i+J )) 

a * — ' ~ — a; r — 

i=i i=i i=i 

- + 0(1/2^) + - V log 2 fe l • 2^)M2 y( " ) ) ) , (3.13) 

a a z — ' V / 



i=l 



where in the last two relations we used the facts that YlT=i ^ w i = 1 anc * YlT=i w i = ®- 
Now, the fact that £.(■) is a slowly varying function, implies that 1(2* ■ 2 j ^)/£(2 j ^) -> 1, 
as j(n) — »■ oo. This shows that the right-hand side of (13.131) converges to H = 1/a, as 
j(n) — )■ oo and hence the estimator if n is consistent, as n — > oo and as j(n) — y oo. Note 
that the rate of the bias (EH n — H) depends not only on the term 0(1/ 2^) but also on the 
rate of the convergence 

£(Xj)/£(j)-^l, asj->oo. 

This last rate depends on the structure of the slowly varying function £(■) and it may be 
possible to control in terms of the Karamata's integral representation 

£(x) — Ci(x) exp < — / e(u)/udu 

Jx 
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at the expense however, of two additional parameters controlling the rates of q(-) and e(-). 

This argument shows the consistency of the max-spectrum based estimator of a for i.i.d. 
X(kys with regularly varying tails. In principle, one can establish asymptotic normality 
of these estimators along similar line, but this would involve technically complicated as- 
sumptions on the slowly varying functions considered. Further, as in Theorem 13. II one can 
establish asymptotic normality results for the a n 's for m— dependent data. We chose not to 
pursue the general case of regularly varying tails here since the technical details may ob- 
scure the idea behind the estimator. These important theoretical results will be pursued in 
subsequent work on the subject. 

3.2 Distributional consistency (in the large scales regime) 

In Theorem l3.1[ we consider an asymptotic regime where the number of block-maxima rij on the 
scale j = j(n) grows, as n — > oo. This is essential for the consistency of the estimators a n . In 
practice, however, the situation where we have a fixed number of block-maxima per scale is also 
of interest. Namely, for a sample X(l), . . . , X{n) and & fixed number of block-maxima r, we let 
j(n) := [log 2 (n/r)] and consider the estimator 



a. 



^2wiY i+j{nh where £= [log 2 r]. (3.14) 



This estimator corresponds to taking the largest £ scales in the max-spectrum, where £ is fixed. 
One cannot expect the estimators a n to be consistent (even for independent data) since they involve 
averages over a fixed number of block-maxima statistics. Nevertheless, the asymptotic distribution 
of a n is of interest. 

The next result establishes the 'distributional consistency' of the estimators a n in the aforemen- 
tioned regime. We do so under the condition that the block-maxima in (12.11) are asymptotically 
independent. This condition is in fact quite mild, as shown in Lemmas |3 . 1 1 and [3T2l below. 

Theorem 3.2 Suppose that (12.11) holds where the Z(k) 's are i.i.d. a—Frechet. Then, 

a n — > a z , as n — > oo, (3.15) 

where dip := l/(X^=i u 'i^i Z )> and where {YfY i=1 is the max-spectrum of a sequence of i.i.d. 
a—Frechet variables Z(l), . . . , Z{r). 

Proof: The result readily follows from the continuous mapping theorem. Indeed, by (12.11) . and 

in view of (fill), wehave {D(j(n),k)/2 j( - n ^ a , k = 1, . . . ,r} -U {Z(k), k = l,...,r}, as n ->• 
oo, and hence 

{log2 D(j(n), k) - j(n)/a, k = 1, . . . , r} — ¥ {log 2 Z(k), k = 1, . . . , r}, as n ->■ oo. 
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Due to the dyadic structure of the block-maxima D(j(n), fc)'s, one can recover D(i + j(n), fc)'s, 
for i = 1, . . . , £ and fc = 1, . . . , [r/2 l ] from the block-maxima D(j(n), k), k = 1, . . . , r through a 
continuous combination of maxima operations. Thus, by applying the continuous mapping theo- 
rem again, we obtain 

{Y i+j{n) - j{n)/ a y i=1 A {Yf} £ i=1 , as n ->• oo, 

which yields the convergence (13.151) since £) i=1 Wij(n)/a = 0. D 

Condition (12.11) appears stringent, but contrary to intuition, it holds in most practical situations. 
We were unable to find an example of ergodic heavy-tailed time series X (of positive extremal 
index) with asymptotically dependent block-maxima. We next show that (12.11 ) holds for the large 
class of linear processes. 

Let £fc, k G Z be i.i.d. heavy-tailed innovations, such that P{|6fc| > x } ~ a^x~ a , x — Y oo, 
where P{£fc > x}/P{|^| > x} — > p, x — > oo for some p G [0, 1]. Consider the linear process 



X(k) := J2 c ^-*' fc G Z - 



(3.16) 



Mikosch and Samorodnitskyl (120001) provide a recent and comprehensive treatment of the linear 
processes as in (13.161) (see also Davis andResnickl (|1985|) ). More precisely, by Lemma A. 3 in 
Mikosch and Samorodnitskyl (|2000l) . the following conditions on the q's guarantee the almost sure 
convergence of the series in (13.161) . 



V] cf < oo (if a > 2) 



and 



E 



< oo, (if a < 2), for some e> 0. (3.17) 



These conditions are necessary for a > 0, and nearly optimal for a < 2 (see Lemma A. 3 in 
Mikosch and Samorodnitskyl (|2000l) ). By Lemma A. 3 in the last reference, we also have that the 
tails of the X(/c)'s are regularly varying with exponent a (see Relation (A. 2) therein). 

The following result shows that (12.11) holds for the linear process X = j X(k)}k<=7, under th e 



condit ions (I3.17I). The proof follows by a simp le combination of arguments in lDavis and Resnick 
(|1985h and lMikosch and Samorodnitskyl (|2000h and it is given in the Appendix, for completeness. 



Lemma 3.1 Let c+ := maxj> Cj and c_ := maxj> (— q). Suppose that either pc + > or (1 — 
p)c_ > 0. Then, the linear process in ( 13.161 ) satisfies (12.11 ) where the Z{k)'s are i.i.d. a—Frechet 
with scale coefficient <J^pc\ + (1 — p)c a _) 1 / a . 

The next result provides some further insight to the observed independence phenomenon for block- 
maxima. Namely, it turns out that the block-maxima of a heavy-tailed time series are always 
asymptotically independent, provided that they converge to a max-stable process. 

Lemma 3.2 Let X = {X(k)}kez be a heavy-tailed time series with marginal distributions as in 
dl.lt . Suppose that d2.lt holds where the Z(k) 's are not assumed independent. 

If the limit time series Z = {Z(k)}k£N is multivariate max-stable, then it consists of i.i.d. 
random variables. 

The proof is given in the Appendix. 
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h 


3 


4 


5 


6 


7 


8 


9 


10 


11 


90% c.i. = 0.1 


0.891 


0.894 


0.912 


0.919 


0.897 


0.903 


0.889 


0.895 


0.875 


= 0.3 


0.759 


0.888 


0.914 


0.915 


0.899 


0.901 


0.889 


0.895 


0.875 


= 0.5 


0.229 


0.772 


0.889 


0.915 


0.892 


0.899 


0.888 


0.895 


0.875 


= 0.7 


0.000 


0.299 


0.801 


0.895 


0.895 


0.899 


0.887 


0.895 


0.875 


= 0.9 


0.000 


0.000 


0.070 


0.641 


0.843 


0.890 


0.877 


0.890 


0.875 


95% c.i. = 0.1 


0.943 


0.952 


0.954 


0.953 


0.949 


0.950 


0.931 


0.931 


0.904 


= 0.3 


0.844 


0.940 


0.952 


0.953 


0.949 


0.950 


0.931 


0.931 


0.904 


= 0.5 


0.321 


0.854 


0.950 


0.954 


0.948 


0.950 


0.931 


0.931 


0.904 


= 0.7 


0.000 


0.395 


0.872 


0.946 


0.944 


0.950 


0.931 


0.931 


0.904 


= 0.9 


0.000 


0.000 


0.123 


0.738 


0.911 


0.941 


0.927 


0.930 


0.904 


99% c.i. = 0.1 


0.990 


0.990 


0.989 


0.991 


0.987 


0.993 


0.975 


0.972 


0.947 


= 0.3 


0.946 


0.985 


0.990 


0.991 


0.987 


0.992 


0.975 


0.972 


0.947 


= 0.5 


0.552 


0.953 


0.984 


0.990 


0.987 


0.991 


0.975 


0.972 


0.947 


= 0.7 


0.000 


0.642 


0.959 


0.981 


0.988 


0.990 


0.974 


0.972 


0.947 


= 0.9 


0.000 


0.000 


0.276 


0.897 


0.968 


0.984 


0.973 


0.972 


0.947 



Table 3.1: Coverage probabilities of the asymptotic confidence intervals (13.181) for a for max-AR(l) time 
series as in ( 13.191 ) of length 2 15 . Max self-similarity estimators H = H(ji, ]%) were used with 1 < ji < j'2 
and 32 = 15. Results for three confidence levels: 90%, 95% and 99% are shown for different values of j\. 



3.3 On the construction of confidence intervals 

In many applications, an uncertainty assessment about the estimated tail exponent is important, 
which requires the construction of confidence intervals. 

The literature is rather sparse for confidence intervals for the heavy tail exponent even in the 
case of independent data. We are not aware of any general results on the asymptotic distribution 
of the Hill or the moment estimator of a for dependent data. Theorem 13.11 above suggests the 
following asymptotic confidence interval for a of level 7, < 7 < 1: 



(H - Hz {1 



(l- T )/2 



C-w I 



(H + Hz {1 ^ 



7)/2 



C-w I 



(3.18) 



where z^_ y y 2 is (1 + 7)/2— quantile of the standard no rmal distribution, and where c w as in 
Theorem 13. li Here, as recommended in lStoev et al.1 (|2006l) . we use the reciprocal of a symmetric 
confidence interval for H to obtain one for a = 1/H (see also (13.61) ). 

Tables 13.11 and 13.21 illustrate coverage probabilities of confidence intervals for a, based on 
Theorems 13.11 and I3.2L respectively. They are based on 1 000 independent replications of max- 
AR(1) time series X = {X(k)} k ez- 



X(k) := (pX(k - 1) V Z(k) = \/ ^Z{k -i), k = 1, 



n, 



(3.19) 



i=0 
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h 


5 


6 


7 


8 


9 


10 


11 


12 


13 


90% c.i. = 0.1 


0.884 


0.909 


0.903 


0.907 


0.901 


0.914 


0.887 


0.902 


0.917 


= 0.3 


0.903 


0.906 


0.915 


0.888 


0.898 


0.910 


0.906 


0.907 


0.916 


= 0.5 


0.911 


0.908 


0.905 


0.905 


0.898 


0.906 


0.890 


0.902 


0.898 


= 0.7 


0.837 


0.885 


0.879 


0.898 


0.906 


0.908 


0.907 


0.906 


0.899 


= 0.9 


0.103 


0.735 


0.863 


0.888 


0.894 


0.909 


0.920 


0.909 


0.915 


95% c.i. = 0.1 


0.945 


0.953 


0.950 


0.947 


0.947 


0.951 


0.946 


0.953 


0.959 


= 0.3 


0.956 


0.944 


0.953 


0.941 


0.942 


0.955 


0.946 


0.963 


0.956 


= 0.5 


0.949 


0.955 


0.956 


0.947 


0.935 


0.945 


0.947 


0.952 


0.933 


= 0.7 


0.894 


0.949 


0.939 


0.949 


0.939 


0.956 


0.954 


0.947 


0.947 


= 0.9 


0.163 


0.820 


0.935 


0.943 


0.934 


0.958 


0.959 


0.959 


0.957 


99% c.i. = 0.1 


0.992 


0.993 


0.992 


0.994 


0.984 


0.989 


0.989 


0.991 


0.997 


= 0.3 


0.995 


0.991 


0.987 


0.993 


0.985 


0.992 


0.992 


0.992 


0.998 


= 0.5 


0.990 


0.996 


0.991 


0.997 


0.993 


0.986 


0.984 


0.988 


0.980 


= 0.7 


0.953 


0.990 


0.989 


0.992 


0.990 


0.994 


0.988 


0.994 


0.993 


= 0.9 


0.337 


0.933 


0.984 


0.990 


0.980 


0.995 


0.984 


0.989 


0.993 



Table 3.2: Coverage probabilities of empirical confidence intervals based on Theorem I3.2l for a for max- 
AR(1) time series as in ( 13.191 ) of length 2 15 . Max self-similarity estimators H = H(j\, j 2 ) were used with 
1 < jx < j2 and j2 = 15. Results for three confidence levels: 90%, 95% and 99% are shown for different 
values of ji. 



of size n = 2 15 = 32 768 for different values of 0. Here the Z(k)'s are i.i.d. and a— Frechet with 
a = 1.5. The coverage probabilities for 90%, 95% and 99% levels of confidence are reported in 
each row, as a function of jx. 

Observe that when the data are closer to independent (0 = 0.1), the coverage probabilities 
match the nominal values even for small ji's. As the degree of dependence grows, larger values 
for jx are required to achieve accurate coverage probabilities. Nevertheless, even in the most 
dependent setting (0 = 0.9) the value of jx = 8 in Table 13. ll yields very good results. 

Observe that coverage probabilities in Table 13 . 1 1 deteriorate for very large scales jx. This is due 
to the inadequacy of the normal approximation in Theorem [3J] in the presence of a limited number 
of block-maxima. For large ji's the regime described in Theorem 13.21 is more applicable. Table 
13.21 shows that the coverage probabilities based on (13.151) are very accurate even for the largest 
scales jx = 13. We obtained these confidence intervals by using a Monte Carlo method. Namely, 
we approximate the distribution of the statistics op based on 1,000 independent paths of i.i.d. 
1— Frechet variables, multiplied by the estimated S n 's. Although these confidence intervals are 
significantly slower to compute than (13.181) . they exhibit excellent coverage probabilities even for 
the largest scales jx. 

In conclusion, the brief numerical experiments suggest that the confidence intervals in (13.181) 
work well in practice, even for dependent data, for judicious choice of scales jx and j 2 . The 
confidence intervals based on Theorem 13.21 on the other hand, work well for all sufficiently large 
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scales, where the asymptotic normality may not apply. Both types of confidence intervals are 
useful in practice. 



4 On the automatic selection of the cut-off scale j 



i 



In the ideal case of a— Frechet i.i.d. data, the max-spectrum plot of Yj is linear in j. When the 
distribution of the data is not Frechet, or when the data are dependent, then the max-spectrum is 
asymptotically linear, as the scales j tend to infinity. It is therefore important to select appropriately 
the range of large scales j for estimation purposes. In view of (12.61) . one can always choose 
j 2 = [log 2 n] to be the largest available scale and hence, the problem is reduced to choosing the 
scale ji, 1 < ji < j 2 . The estimator of a is then obtained by performing a WLS or GLS linear 
regression of Yj versus j, j\ < j < j 2 (see (T2.7I )). 

The "cut-off" parameter j 1 can b e selected either by visually inspecting the max-spectrum or 



through a data driven procedure. In IStoev et al.l (|2006l) an automatic procedure for selecting the 



cut-off parameter was proposed, in the case of independent data, whose main steps are briefly 
summarized next. We also demonstrate that it performs satisfactorily for dependent data. The 
algorithm sets j 2 : = [log 2 n] and j 1 := max{l, j 2 — b}, with b = 3 or 4 in practice for moderate 
sample sizes. Next, j x is iteratively decreased until statistically significant deviations from linearity 
of Yj, ji < j < j 2 are detected. Namely, as j 1 > 1, at each iteration over the scale jx the following 
two quantities are calculated H ncw = H(ji — l,j 2 ) and iJ old = H(ji,j 2 ). Whenever the value of 
zero is not contained in a confidence interval centered at (if ncw — H \ d ), the algorithm stops and 
returns the selected j\ and a = l/H i d ; otherwise, it sets jx '■— ji — 1 and proceeds accordingly. 
The construction of the confidence interval about (H ncw — H^w) utilizes t he coy ariance matrix Ei 



in Theorem 13.11 which is the same as in the i.i.d. case, see IStoev et al.l (|2006l) . The asymptotic 
normality result suggests that the methodology in the case of i.i.d. data applies asymptotically to 
dependent data, for moderately large scales j\. Alternatively, the results of Theorem 13.21 may be 
used to suitably correct the confidence intervals on the largest scales j\. We did not implement this 
method, since it is computationally demanding in practice. 

Figure 0] demonstrates the performance of the automatic selection procedure in the case of 
dependent data. Even though the marginal distributions of X are Frechet, the dependence causes 
a knee in the max-spectrum plot (see, e.g. Figure |3]). The automatic selection procedure picks 
up this "knee" and yields reasonably unbiased and precise automatic estimates of a (see the top- 
right panel in Figure S]). Comparing the MSE plot and the histogram of the selected ji values, 
we see that over 70% of the times the value ji = 5 was chosen, which is close to the optimal 
value of ji = 6. The histogram of the resulting automatic estimates of a (top-right panel) is 
similar (with the exception of a few outliers) to the histogram of the estimators corresponding to 
the MSE-optimal jx = 6 (bottom-right panel). 

Recall Table 13 . 1 1 and observe that the case (f) = 0.9 corresponds to the time series analyzed 
in Figure HI The coverage probabilities of the confidence intervals for a essentially match the 
nominal levels, for j\ > 8. On the other hand the MSE-optimal value is j 2 = 6 (Figure H]) which 
is only slightly smaller than jx = 8. This can be contributed to the fact that the bias involved in 
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Figure 4: The top-plot shows the histogram of automatically selected j\ values for 1,000 independent 
samples of size N = 2 15 from an exponential moving maxima a— Frechet process, X = {X(k)}k£Z, 
defined as in ( 13.191 ) with <p = 0.9 and with i.i.d. 1.5— Frechet innovations. We used significance level and 
back-start parameters are p = 0.01 and 6 = 4, respectively. The top-right plot show the histogram of the 
resulting a = l/H estimates. The bottom-left plot shows estimates of the square root of the mean squared 
error (MSE) ¥,(H — H) 2 as a function of j\. The bottom-right plot contains a histogram of a estimates 
obtained with the MSE-optimal choice of ji = 11. 



the estimators at j\ — 6, although comparable to their standard errors is significant and noticeably 
shifts the confidence interval. As the scale j\ grows, the bias quickly becomes negligible and the 
resulting confidence intervals become accurate. 

These brief experiments suggest that the automatic procedure is practical and works reasonably 
well in the case of dependent moving maxima time series. Similar experiments for independent 
heavy-tailed data (not shown here) indicate that the automatic selection procedure continues to 
perform well and chooses values of ji close to the MSE-optimal ones, thus making it appropriate 
for use in empirical work. Nevertheless, a detailed study of its performance under a combination 
of heavy-tailed distributions and dependence structures, as well as its sensitivity to the choice of 
the back-start parameter b and the level of significance p, is necessary and the subject of future 
work. 
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5 Applications to Financial Data 

We analyze market transactions for two stocks -Intel (symbol INTC) and Google (GOOG)- us- 
ing the max-spectrum. The data sets were obtained from the Trades and Quotes (TAQ) data 
base of consolidated transactio ns of the New York Stock Exchange (NYSE) and NASDAQ (see 



Wharton Research Data Service! (|urll) ) and include the following information about every single 
trade of the underlying stock: time of transaction (up to seconds), price (of the share) and vol- 
ume (in number of shares). In our analysis, we focus on the traded volumes of the two stocks 
for November 2005, that coul d provide info r matio n about the respective sector's, as well as the 



market's economic conditions (|Lo and Wangl (|2000|) ) 



A ubiquitous feature of the volume data sets is the presence of heavy, Pareto type tails, as 
can be seen in Figure [6l Specifically, the top panel shows transaction volumes for the Google 
stock on November 7, 2005, while the bottom panels show the Hill and the max-spectrum plots, 
respectively. The tail exponent, estimated from the max-spectrum over the range of scales (11,15) 
is a = 1.0729. The Hill plot indicates heavy-tail exponent estimates between 1.5 and 2, which 
correspond to the slope of the max-spectrum over the range of scales (1, 10). The small dip in 
the Hill plot for very large order statistics (small values of k) can be related to the behavior of 
the max-spectrum for scales (11, 15). Such behavior is typical for almost all liquid stocks, as 
well as the presence of non-stationarity and dependence. In order to minimize the intricate non- 
stationarity effects, we focus here on traded volumes within a day. The max-spectrum yields 
consistent tail exponent estimates even in the presence of dependence. This fact and the robustness 
of the max-spectrum suggest that it may be safely used in various practical scenarios involving 
heavy-tailed data. In Figure [5l we show the max self-similarity estimates of the tail exponents, 
for each of the 21 trading days in November, 2005. The max-spectra of these 21 time series (not 
shown here) of trading volumes are essentially linear. This confirms the validity of a heavy-tailed 
model for the data, valid over a wide range of time scales - from seconds up to hours and days. 
Further, at the beg inning and end of the tra ding day, several large volume transactions are observed, 



as documented in iHong and Wangl (|2000l) . Nevertheless, the trading activity of Google, remains 
essentially linear over the period under study, with a few bumps at the largest scales due to diurnal 
effects and other non-stationarities. 

In Figure [51 the daily tail exponent estimates are shown for the Google stock, which fluctuate 
between 1 and 2, along with pointwise confidence intervals (broken lines). These estimates indicate 
that the tail exponent exhibits a significant degree of variability over the period of a month, and that 
an infinite variance model may be most appropriate for modeling trading volumes. For example, 
on November 7 (see Figure [6]), the estimate of a is nearly 1, which may be due to the several 
extremely large peaks in the volume data. The upward knee in the max-spectrum of this data set is 
likely caused by these peaks. The max-spectra on most other days are much closer to linear than 
the one in Figure [6l Such correspondence between the presence of large peaks in the data and the 
behavior of the max-spectrum can be used to identify statistically significant fluctuations in the 
volume data. Hence, the max-spectrum plot can be used not only to estimate a, but also to detect 
changes in the market. We illustrate this last point next, by examining an unusual trading pattern 
in the Intel stock towards the end of November, 2005. 
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Figure 5: Top panel: traded volumes for the Google stock from the TAQ data base of consolidated trades 
of NYSE and NASDAQ for the month of November, 2005. The x- axis and y-axis correspond to time and 
number of traded shares, respectively. This is a high- frequency data set, where each data point corresponds 
to the volume of a single transaction and no temporal aggregation is performed. The gaps of zeros in the data 
correspond to hours of the day with no trading and/or weekends. Bottom panel: estimated tail exponents 
(indicated by circles) from the max-spectrum and their corresponding 95% confidence intervals (indicated 
by broken lines), based on the asymptotic expression in (13.18I ). Automatic selection of the cut-off scale j\ 
was done with p = 0.1 and 6 = 3 (see Section |4]). Every estimate was computed from a day worth of 
transaction volumes. 



Figure [7] shows the max-spectrum estimates of the tail exponents for the traded volumes of the 
Intel stock for 21 trading days in November 2005. Notice that up to November 21, the tail exponent 
is fairly constant, fluctuating between 1.2 and 2. On November 22 (Tue) and 23 (Wed), before the 
Thanksgiving holiday on November 24 (Thur), the tail exponent takes values larger than 3 and 
5, respectively. This change is quite surprising and it is deemed significant by the corresponding 
confidence intervals. A closer look at the data from November 23 (Figure [8]) shows a changing but 
persistent pattern of trading as compared to November 21 ; see for example Figure [9]). 

This behavior proves persistent and continues on November 25, after the Thanksgiving holiday. 
Moreover, no such behavior was observed for the Google data on any of the 21 trading days in 
November, 2005. Although trading of extremely large volumes occurs on November 23, as seen in 
Figure [81 these trades are very regular and hence inconsistent with a heavy-tailed model. Although 
regular in time, these large transactions occur on a time scale of several minutes, and hence the 
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Figure 6: Top panel: the transaction volumes during the trading hours of November 7, 2005. The x-axis 
corresponds to the number of the transaction and the y-axis to number of shares. Note that about 50, 000 
transactions occurred on this day, which is typical for the Google stock. Observe also the fairly classical 
heavy-tailed nature of the volume data. Bottom panels: the Hill plot (left) and the max-spectrum (right) 
of the data. The Hill plot is zoomed-in to a range where it is fairly constant and a tail exponent between 
1.5 and 2 can be identified. The max-spectrum reveals more: on large scales the plot is steeper than on 
small scales with the tail exponent about 1 on the range of scales (11, 15) and exponent about 1.7 on scales 
(1, 10). The presence of a knee in the max- spectrum plot suggests different behavior of the largest volumes 
on large time scales than on small time scales and can be contributed to the several very large spikes of over 
20, 000 traded shares (about 5 million US dollars) the top plot. 



small scales of the max-spectrum are not affected by these peaks and behave as on a normal trading 
day (see FigurelU). However, the large peaks dominate the larger scales j and their regularity makes 
the max-spectrum essentially horizontal. The Hill plot, shown on the bottom-left panel of Figure 
[8l fails to pick up the unusual behavior, since it suggests values of a « 1, which corresponds only 
to the smallest portion of the max-spectrum, where a(7, 11) = 1.0578 ~ 1. 

Our best guess is that this change in activity is related to the approval by the board of directors 
of the Intel Corp. on November 10 of a program for a stock buy-back worth of up to 25 billion 
US dollars; (see, e.g. the Financial Times, London, on Thursday November 11, page 27); hence, 
some of the delayed effects of the announcement of the program and market reaction to it are 
demonstrated in the volume activity discussed above. 
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Figure 7: This figure has the same format as Figure [5] On the top panel, the traded volumes of the Intel 
stock for the month of November, 2005 are shown. Observe that the tail exponent estimates on the bottom 
plot fluctuate between 1.5 and 2 up to November 21. On and after November 22, unusually high values of a 
appear (compare with the case of the Google stock in Figure|5]). This is further analyzed in Figures [8] and |9l 
below. 
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6 Appendix 

6.1 Rates of convergence for moment functionals of dependent maxima 

Proposition 6.1 Suppose that f : (0, cxd) — > R is an absolutely continuous function on any com- 
pact interval [a, b] C (0, oo), and such that f(x) = f(x ) + J x f'(u)du, x > Ofor some (any) 
x > 0. 

Let for some mGR and 5 > 0, 



^/(^l+esssupo^^n/'G/)! 
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0, as x i 0, 



(6.1) 
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Figure 8: Top panel: traded volumes of the Intel stock for November 23, 2005. Observe the regular 
occurrence of many very large trades of approximately the same sizes: 10, 000, 15, 000, 25, 000 and a few 
of 20, 000 shares. This is a very unusual behavior of the volume data, as compared to a typical trading day 
(see, e.g. Figure [9]). Bottom panels: the Hill plot and the max-spectrum of the data. Notice that the Hill plot 
fails to identify the unusual behavior of the data, whereas the max-spectrum flattens out, on large scales due 
to the regular non-heavy tailed behavior of the largest traded volumes. Once identified on the max-spectrum 
plot, one can perhaps read-off these details from the volatile Hill plot for very small values of k. On small 
scales, where the regular large transactions are not frequent and do not play a role, the max-spectrum yields 
tail exponents about 1. This is in line with the Hill plot. 



x- a \f(x)\ + x 1+s eB8swp y > x y- a \f'(y)\ — ► 0, as x -> oo. (6.2) 

Suppose also that the time series X = {X n } n£ z satisfies Conditions \3. 1 1 and 13.21 where ci(x) is 
such that: 

poo 

-a I el i 



ci(x)x a \f'(x)\dx < OO. 



(6.3) 



Then, E\f(M n )\ < oo, for all sufficiently large uGN, and for some Ct > 0, independent of n, 



\Ef(M n /n^ a )-Ef(Z)\<C f n 



(6.4) 



where Z is an a—Frechet variable with scale coefficient a 
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Figure 9: This figure has the same format as Figured] The top plot shows the volumes of INTC during 
November 21, 2005, which as the volumes of GOOG in Figure [6) behave like a classical heavy-tailed sam- 
ple. The Hill plot and the max-spectrum (bottom left and right panels, respectively) identify tail exponents 
around 1.5. The cut off scale in the max-spectrum plot was selected automatically with p = 0.1 and 6 = 3 
(as in Figure [7] Notice the volatile, saw-tooth shape of the Hill plot which is due to its non-robustness to 
deviations from the Pareto model. The max-spectrum is more robust and fairly linear with a small knee on 
scale j = 12, which may be due to a few clusters of large volumes in the beginning and at the end of the 
trading day. 



Proof: The proof is similar to the proof of Theorem 3.1 in lStoev et al.l (|2006f) . Indeed, as in the 
above reference, one can show that E|/(Z)| < oo and E|/(M n )| < oo, for all sufficiently large n. 
Further, by using the conditions (16.11) and (16.21) and integration by parts, we have that 



Ef(M n /n^ a ) - Ef(Z) 



(G(x) - F n (x))f'(x)dx, 



(6.5) 



where F n (x) := ¥{M n /n l / a < x} and G(x) = ¥{Z < x}. Since F n (x) = e ~ c ^ x)x Q , by the 
mean value theorem, we have 

-c(n,x)x~ 



\G(x)-F n {x) 



-c x x 



-p 



— e 



< n p cAx)x~ 



, < \c(n,x) - c x \x- a e- min{ecx ' c{n > x)}x ~ 

" (Q " 7) + e -0c x x- 



where in the last inequality, we used Relations ( 13.11) and (13.21) . 
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Thus, by ( 16.51 ), we have that 

\Ef(M n /n 1/a )-Ef{Z)\ < n~? Cl (x)x~ a \f'(x)\ ( e - c ^- (Q " 7) + e - CxX ~ a )dx 

=: n-*{j\[). ,6.6) 

The last integral is finite. Indeed, since the exponential terms above are bounded, Relation (16.31) 
implies that the integral "j™" is finite. On the other hand, conditions (13.11) and (16.11) imply that, 

ci(x)\f'(x)\ = 0(x~ R ), x I 0, for some R E R. However, for all p > 0, we have (e~ C2X ' + 
e -c x x °^ _ ( x p^ x I 0, since a — 7 > 0. This implies that the integral in "L " in (16.61) is also 



finite. This completes the proof of (16.41) . □ 

Proof of Proposition I3.lt It is enough to show that the functions f(x) := | ln(x)| p and 
f(x) := (ln(a;)) fc , p > 0, k E N satisfy the conditions of Proposition 16.1 1 In the first case, for 
example, \f'(x)\ = px~ l \ ln(a;)| p_1 , x > 0. Therefore, the assumption f°° ci(x)x~ a ~ 1+s dx < 00 
implies (16.31) . since | ln(x)| p_1 < const x 5 , for all x E [1, 00). The conditions (16.11) and (16.21) are 
also fulfilled in this case, and hence Proposition 16. II yields the desired order of convergence. The 
functions f(x) = (ln(x)) k , k E N can be treated similarly. □ 

In the rest of this section we demonstrate that Conditions 13. II and 13.21 apply to a general class 
of moving maxima processes. 

Let {Z n } neN be a sequ ence of i.i.d. random variables with the cumulative distribution function 
P{Z < z) = F z (z). As in lStoevetal](l2006h . we suppose that 



F z (z)=exp{-c(z)z- a } 1 z>0, (6.7) 

and impose two further conditions, analogous to Conditions 13.11 and 13.21 
Condition 6.1 There exists (3' > 0, such that 

\c(z) - c z \ < Kz' 13 ' , forallz>0, (6.8) 

where c z > and K > 0. 
Condition 6.2 F z (0) = and for all x > 0, 

c(z) > cmin{l,z 7 }, for some 7 E (0, a), (6.9) 

with c > 0. 

Observe that (16.81) implies c(z) -^ c z , z — > 00, and in fact P{Z > z] = 1 — F z (z) ~ c^^~ a , as 
z — > 00. Define now the moving maxima process X = {X k } keIi : 

Xk := max aiZj c _ i+ i, k E Z, (6.10) 

l<i<m 

with some coefficients a« > 0, i = 1, . . . ,m, and m > 1. The following result shows that the 
process X satisfies conditions Conditions 13 . 1 1 & [3T2l 
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Proposition 6.2 If the Z n 's satisfy Conditions \6.1\ and \6.2\ then the process X = {X k }k& i n 
( |(5. 1 01 ) satisfies (11.11) , Conditions \3. 1 1 and \3.2\ with 7 as in ( 16. 9P , 

°"o =c Z/, a i\ /3 = min{l, /3'/a} and Ci(x) := const (1 + x~^ ), (6.11) 

*=i 

where (3' is as in ( 16.81 ) and where c x '■— c z maxi<£< m af. In particular, the extremal index of X is 
9 = c x /a$ = maxi<j< m a?/ Y,T=\ a i- 

Proof: We first derive the marginal distribution of the X^'s. By (16.71) and (16.101) . we have 

in 
F{X k <x} = P{Z k < x/a-t, ..., Z fe _ m+ i < x/a m } = exp{- ^ c{x/ai)a^x~ a }. 

i=i 

Thus, in view of (16.81 ), c(x/a,i) — > c z , x — > 00, and hence, as x — > 00 

m 
F{X k > x} ~ a%x~ a , where a% := c z ^ af . (6.12) 

We now focus on the maxima M n := maxi<j< n Xj. For n > m, and x > 0, we have that F n {x) := 
P{M n /n 1 / Q < x} equals 

F n (x) = P{Xi < n l/a x, ...,X n < n 1/a x} 

n—ra+l m—2 

= P{ \/ &'^' ^ ^^ V a M Z i ^ nl/a% ' V h i Z «-i ^ nl/ ° x } 

j=2-m. j=l j=0 

where 

m m 1+J 

a (i) := V ak ' g i> m = V ak ' hj = V a fc . 

fc=l fc=2-j fc=l 

Therefore, by using the independence of the Zfs and Relation (16.71 ), we get F n (a;) = 
exp{— c(n, x)x~ a }, x > 0, where 

1 m-2 

c(n, x) = - ( ^ c^x/g^g^ + (n - m + l^c^x/aw) + ^ c{n 1 ' a x/h j )h'j) . 

j=2-m j=0 

(6.13) 
We will now show that Relation (13.11) holds with /3 and ci(-) as in (16.111) . Let cx ■= c-zO(i) = 
cz max!<j< m af . By (16.131) , we have 

\c(n, x) — cx\ = \c(n,x) — cza? x \\ 





^ V^ \ I l/a I \ a fi-m + 1 / l/« / \ 

< - > c (n ' x/g jtm ) - c z g jm + c{n ' x a {1) ) - c z 



n 

.1=2-111 



a (1) 



+ - Y Un^x/hj) - c z h™ + - =: Aj + A 2 + A 3 + -, (6.14) 

3=0 
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where the constant C does not depend onx. In the last relation, we add and subtract the finite 
number of 2(ra — 1) terms of the type gf m cz and h"cz and apply the triangle inequality. 
Now, by applying Relation (16.81) to each one of the absolute value terms in A i, we obtain 



lS n 2^ n X Sj, m S n l+/3'/a Ka (l) X ~ n l+/3'/a X ' ^^ 



where the constant C 1 does not depend on n and x and where in the last inequalities we used that 
9j,m < Q(i). One obtains a similar bound for the term A 3 in (16.141 ): 

^ < ^^ (6-16) 

where the constant C 3 does not depend on n and x. 

Now, for the term A 2 in (16.141 ), we also have by (16.81 ) that 

U — 771+1 Q+/3 / _o/ _fl// a C* 2 o/ 

A 2 < ifa^x p n p/ <— ^-x^, (6.17) 

77 l J 77P ' a 

where the constant C 2 does not depend on 77 and x. 

By combining the bounds in ( 16.151) - ( 16.171) , for the terms in (16.141 ), we obtain 

|c(n,g)-c x |< wl+j8//a x +^x +-, 

which shows that (13.11) holds with C\{x) = const (1 + x" 13 ), where (3 := /3'/a. 

We now show that (13.21 ) holds. Since (13.21 ) involves a lower bound, we can ignore the two 
positive sums in ( I6.13I ). Recall ( 16.91) and note that c^^x/am) > c 2 min{l, (r^^x/am) 1 }. 
Since, for sufficiently large 77, n x l a > am, and (n 1 ^ a x/am)' y > x 7 , we obtain c^^x/am) > 
c 2 min{l,x 7 }. Therefore, by (16.131) . since for all sufficiently large n, [n — m + l)/n > 1/2, we 
have c(t7, x) > c 2 min{l, x 7 }, where c 2 = a?^d 2 /2. This implies (13.21) and completes the proof of 
the proposition. □ 

6.2 Auxiliary lemmas 

The next three lemmas were used in the proof of Theorem [3TTJ 

Lemma 6.1 Under the conditions of Theorem \3.1\ for all j > log 2 m, we have 

Vw(Yj - Yj) < -Var(log 2 (D(i, 1)/D(j, 1))). 
rij 
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Proof: For notational simplicity, let £ fc := log 2 (D(j, k)/D(j, k)), k = 1, . . . , rij. We have, by 
the stationarity of ^ in k, that 



n, — 1 



Var(^ - Yj) = -Vax(6) + A E(% - ^Cov^,^ 



J ■? fc=i 



Note that £ fc+1 = log 2 (£>(j, 1 + k)/D(j, 1 + k)) and & = log 2 {D(j, 1)/D(j, 1)) are independent 
if k > 1. Indeed, this follows from the fact that the process X is m— dependent, and since £k+i 
and £j depend on blocks of the data separated by at least 2- ? > m lags. Therefore, only the lag-1 
covariances in the above sum will be non-zero and hence 



1 2 



Var(F i - Yj) < — Var^i) + - Cov(&,&) < — Var(£i) 



rij n ,• 



3 



"., 



since by the Cauchy-Schwartz inequality we have |Cov(^ 2 ,^i)| < Var(^ 2 ) 1/ ' 2 Var(^i) 1 / 2 = 
Var(^i). This completes the proof of the lemma. □ 

Lemma 6.2 Under the conditions ofTheorem \3. 1\ for any fixed k, we have D(j, k)/D(j, k) — > 1, 
as j — > oo. 

Proof: Let 5 G (0, 1/a) be arbitrary and observe that 

P{£>(j, fc)/D0", fc) < 1} = HR > D(j, k)} < F{R > &*} + P{2>* > 5(j, fc)}, (6.18) 
where i? = maxi<j< m X 2 jo._j) + i. Now, by stationarity, 

¥{R > 2 j5 } = P{ max X t > 2 jS } -> 0, as j -> oo. 

l<i<m 

On the other hand, Relation (13.11) implies that 2~^ a D(j,k) — >■ Z, as n — >■ oo, where Z is a 
non-degenerate a— Frechet variable. Thus, since 5 E (0, 1/a), we have that 

P{2 j5 > D(j, k)} ->■ 0, as j ->■ oo. 

The last two convergences and the inequality (16.181) imply that F{D(j, k)/D(j, k) < 1} — > 0, j — > 
oo. Since trivially ¥{D(j, k)/D(j, k) > 1} = 1, we obtain D(j, k)/D(j, k) converges in distribu- 
tion to the constant 1, as j — > oo. This completes the proof since convergence in distribution to a 
constant implies convergence in probability. □ 



Lemma 6.3 The set of random variables 



loe (Em) p 

all p > 0, where D(j, k) and D(j, k) are as in Theorem \3.1\ 
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j, k E N is uniformly integrable, for 



Proof: Let q > p be arbitrary. By using the inequality \x + y\ q < 2 q (\x\ q + \y\ g ), x,y G 
get 



we 



E 



log; 



D(j,k) 



< 2*E| hg 2 {D(j, k)/V' a )\ q + 2«E| log 2 (5(j, k)/2 



j/a\\q 



D(j,k) 

In view of Proposition (37Q applied to the block-maxima D(j, k) and D(j, k), we obtain 
E| \og 2 {D(j, k)/2 l/a )\ q = E| log 2 (M 2J /2 j/a )\ q — ► const, as j -)• oo. 

Thus the set {E| log 2 (£)(j, A;)/2 i/Q )| 9 , j, fe G N} is bounded. We similarly have that the set 

{E| \og 2 (D(j, k)/2^ a )\ q , j, k G N} is bounded since log 2 (2 : ' — m) ~ j, j — )■ oo, for any fixed m. 



We have thus shown that 



sup E 

j,fceN 



log; 



£>O',*0 



< oo, 



D(j,k) 
for q > p, which yields the desired uniform integrability. D 

We now present the proofs of Lemmas [3.11 and \3?2\ in Section [3T2l 

Proo f of Lemma 13. It We sta rt by noting that the results of Lemma 2.3, and Theorems 2.4 and 
3.1 in Davis and Resnickl (|1985r) continue to hold for the linear process X = \X(k)\k^7„ under 
the mo re general conditions in (13.171) . The proofs of Theorems 2.4 and 3.1 in Davis and Resnick 
(11985b depend on the specific conditions in (13.171) only through Lemma 2.3 and Relation (2.7) 
therein. These two results (Lemm a 2.3 and Relation (2.7)) are valid thanks to Lemma A. 3 of 



Mikosch and Samorodnitskvl (|2000h . 



Now, following the proof of Theorem 3.1 in Davis and Resnickl (|1985h . introduce the map 

T r : M p ((0, oo)xl\ {0}) -> W, 

oo 

T r { / v e (ufc,-"fc)j := ( V « fc e(0,l/r] Vk,---, V Uk e(i/r,(i+l)/r]Vk, ' ' ' , V « t e((r-l)/r,l]^ J • (6.19) 

fe=l 

The map T r is simpler th an than the map T : M p ((0, oo) xl\ {0}) —t D(0, oo) considered in 



Davis and Resnickl (11985b . where -D(0, oo) denotes the Skorkhod space of cadldg functions. The 



space M p of Radon point measures is equipped with the topology of vague convergence, where 
a set K C (0, oo) x R \ {0} is compact if it is closed and bounded away from zero. We will 
argue below that the map T r is almost surely continuous when applied to suitable Poisson random 
measures. 

Proceeding as in the proof of Theorem 3.1 in Davis and Resnickl (|1985|) . (by Theorem 2.4 (i) 
therein) we get 

oo oo oo 

2_^ e (k/m,m 1 / a X(k)) : 
fc=l 

where '=>-' denotes weak convergence of point processes and where €(t,j) denotes a point measure 
with unit mass concentrated at (t,j) G (0, oo) xl\ {0}. In (16.201) . {(t fc , jfc)}fc>o are the points of 
a Poisson random measure (PRM) with intensity measure 



e (tfcJfcCi), 



i=0 fc=l 



(6.20) 



fx(dt,dx) = dxxX(dx), where X(dx) 



-a—l- 



apx " l( 0iOO )(x)dx+a(l— p)(— x) 
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-a— 1- 



-(-oo,0) 



[x)dx, 



(recall the distribution of the ^'s above and see (2.1) in lDavis and Resnickl Il985l) ). 

Let now m = £~ T,T=i e (t k j k *) be the PRM in (16301) and observe that F{m(dB) = 0} = 1, 
where B := ((0, 1/r] xl\ {0}) U • • ■ U ((r - l)/r, l]xK\ {0}) is the set associated with the 
map T r in (16.191 ), and where OB denotes the boundary of B. Indeed, this follows from the fact that 
the intensity measure fi(dt, dx) of the PRM m does not charge with positive mass sets of Lebesgue 
measure zero. The fact that F{m(dB) = 0} = 1 shows that, almost surely, the points {(t k , j k )} do 
not lie on the boundary OB. Since the points of discontinuity of T r are at only those measures in 
M p with atoms on dB, it follows that the map T r is almost surely continuous when applied to the 
re alizations of the PRM m. Therefore, the continuous mapping theorem (see e.g. Theorem 3.4.3 



in 



Whittl (120021) ) yields: 



T r 



£ 

fc=i 



oo oo 



: (k/m,m 1 / a X(k)) 



T r TT. 



/ j / , "(tfcJkCi) 

i=0 k=l 



as m — y oo, 



where 



oo oo 



($^$^ e (i fc ,j fc co) = ( Vt fcG(0 ,l/r] V~ QJ fc , ..., V tfc e((r-l)/r,l] V£q Ctffcj =: (Z(l),...,Z(r)). 



T r I C(t, 

j=0 fc=l 



However, since the intervals (0, 1/r], (1/r, 2/r], . . . , ((r — l)/r, 1] in (16.191 ) do not overlap, the 
random variables Z(l), . . . , Z(r) are independent. Moreover, the stationarity (in t) of the intensity 
of the PRM shows that the Z{k) , s are identically distributed. Now, it remains to argue t hat th e 
Z{k) , s have the desired a— Frechet distribution. This follows as in Davis and Resnickl (|1985h . 
since for Z(l), for example, we have: 

Z ( l ) = V t fe e(0,l/r] V 4 = Cij k = V ifce(0 ,l/r](c + Jfc V (-c_)j fc ), 

which in fact equals the extremal process Y(i) therein evaluated at t = 1/r. □ 

Proof of Lemma |3.2t Fo r multivariate m ax-stable distributions, pairwise independence im- 
plies independence (Ch. 5 in iResnickl (11987b ). Thus, it suffices to show that Z(l) and Z{2) are 
independent. The continuous mapping theorem implies that 



1 



m 



l/a 



(I m (l)Vl m (2))Az(l)VZ(2), as m ^ oo. 



We also have that X m (l) V X m (2) = X 2m (l) and since {2m)- 1 ' a X 2m {\) A Z{1) 
we obtain 

Z(l) VZ(2) = 2 1/a Z(l). 

In view of (11.11) . the marginal distributions of Z can only be o ?— Frechet. 
(Z(l), Z{2)) is a max-stable vector, Proposition 5.11' in IResnickl (|l987l) . implies 



as m — y oo, 

(6.21) 
Thus, since 



¥{Z ± < x u Z 2 < x 2 } = exp | - J 



i fm v im du 



X i 



'.}.-•-: 



xi,x 2 > 0. 



(6.22) 
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for some non-negative functions f 1 and f 2 such that f Q f?(u)du < oo, i — 1, 2. Thus, by (16.211) . 
for all x > 0: 

P{Z(1) V Z(2) < x} = exp{-x~ Q / /^(u) V f^i^du} 

Jo 

= ¥{Z l < 2~ 1/a x} = exp{-2x~ Q f f?(u)du}. 

Jo 

This, since by stationarity f Q f"(u)du = J Q f%(u)du, yields 

f?{u) V f«(u)du = [ f?{u)du+ f ft{u)du. 



The last relation is valid if and only if the non-negative functions fi(u) and /^(w) have disjoint 
supports. This fact, in view of (16.221) . implies the independence of Z{\) and Z(2). □ 



References 

Adamic, L. and B. Huberman (2000). The nature of markets in the world wide web. Quarterly 
Journal of Electronic Commerce 1, 5-12. 

Adamic, L. and B. Huberman (2002). Zipf 's power law and the Internet. Glottometrics 3, 143-150. 

Adler, R., R. Feldman, and M. S. Taqqu (Eds.) (1998). A Practical Guide to Heavy Tails: Statistical 
Techniques and Applications . Boston: Birkhauser. 

Barabasi, A.-L. (2002). Linked: The New Science of Networks. Cambridge, MA, USA: Perseus 
Publishing. 

Carlson, J. M. and J. Doyle (1999). Highly optimized tolerance: a mechanism for power laws in 
designed systems. Physical Review E 60(2), 1412-1427. 

Cheng, S. and L. Peng (2001). Confidence intervals for the tail index. Bernoulli 7(5), 751-760. 

Crovella, M. E. and M. S. Taqqu (1999). Estimating the heavy tail index from scaling properties. 
Methodology and Computing in Applied Probability 1, 55-79. 

Csorgo, S., P. Deheuvels, and D. Mason (1985). Kernel estimates of the tail index of a distribution. 
Annals of Statistics 13(3), 1050-1077. 

Davis, R. A. and S. I. Resnick (1985). Limit theory for moving averages of random variables with 
regularly varying tail probabilities. The Annals of Probability 13(1), 179-195. 

31 



de Haan, L., H. Drees, and S. Resnick (2000). How to make a Hill plot. Annals of Statistics 28(1), 

254-274. 

de Sousa, B. and G. Michailidis (2004). A diagnostic plot for estimating the tail index of a distri- 
bution. Journal of Computational and Graphical Statistics 13(4), 974-995. 

Faloutsos, M., P. Faloutsos, and C. Faloutsos (1999). On power-law relationships of the Internet 
topology. In SIGCOMM, pp. 251-262. 

Feuerverger, A. and P. Hall (1999). Estimating a tail exponent by modeling departure from a Pareto 
distribution. Ann. Statist. 27(2), 760-781. 

Finkenstadt, B. and H. Rootzen (Eds.) (2004). Extreme Values in Finance, Telecommunications, 
and the Environment, Volume 99 of Monographs on Statistics and Applied Probability. New 
York: Chapman and Hall / CRC. 

Hall, P. (1982). On some simple estimates of an exponent of regular variation. J. Roy. Stat. 
Assoc. 44, 37-42. Series B. 

Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. The 
Annals of Statistics 3, 1 163-1 174. 

Hong, H. and J. Wang (2000). Trading and returns under periodic market closures. Journal of 
Finance 55, 297-354. 

Kratz, M. and S. I. Resnick (1996). The qq-estimator and heavy tails. Stochastic Models 12, 
699-724. 

Leadbetter, M. R., G. Lindgren, and H. Rootzen (1983). Extremes and Related Properties of 
Random Sequences and Processes. New York: Springer- Verlag. 

Lo, A. and J. Wang (2000). Trading volume: Definitions, data analysis, and implications of port- 
folio theory. Review of Financial Studies 13, 257-300. 

Lu, J.-C. and L. Peng (2002). Likelihood based confidence intervals for the tail index. Ex- 
tremes 5(4), 337-352 (2003). 

Mandelbrot, B. B. (1960). The Pareto-Levy law and the distribution of income. International 
Economic Review 1, 79-106. 

Mikosch, T and G. Samorodnitsky (2000). The supremum of a negative drift random walk with 
dependent heavy-tailed steps. Ann. Appl. Probab. 10(3), 1025-1064. 

Park, K. and W. Willinger (Eds.) (2000). Self-Similar Network Traffic and Performance Evaluation. 
New York: J. Wiley & Sons, Inc. 



32 



Resnick, S. and C. Starica (1995). Consistency of Hill's estimator for dependent data. Journal of 
Applied Probability 32, 139-167. 

Resnick, S. and C. Starica (1997). Smoothing the Hill estimator. Adv. in Appl. Probab. 29(1), 
271-293. 

Resnick, S. I. (1987). Extreme Values, Regular Variation and Point Processes. New York: Springer- 
Verlag. 

Resnick, S. I. (1997). Heavy tail modeling and teletraffic data. The Annals of Statistics 25, 1805— 
1869. With discussions and rejoinder. 

Stoev, S., G. Michailidis, and M. Taqqu (2006). Estimating heavy-tail exponents through max 
self-similarity. Technical Report 447, University of Michigan. 

van der Vaart, A. W. (1998). Asymptotic statistics, Volume 3 of Cambridge Series in Statistical 
and Probabilistic Mathematics. Cambridge: Cambridge University Press. 

Wharton Research Data Service (url). https : //wrds .wharton.upenn.edu/. Wharton 
School of Management, Universty of Pennsylvania. 

Whitt, W. (2002). Stochastic-Process Limits. An Introduction to Stochastic-Process Limits and 
Their Application to Queues. New York: Springer. 

Zipf, G. (1932). Selective Studies and the Principle of Relative Frequency in Language. Harvard 
University Press. 



33 



